On-Line Learning of Undirected Sparse n-grams
نویسنده
چکیده
n-grams are simple learning models considered state-of-the-art in many sequential domains. They suffer from an exponential number of parameters in their width, n. We introduce undirected sparse n-grams, which store probability estimates only for some n-tuples from the unconditional joint distribution, specifically the most frequent. Experimental results show this sparse version can outperform a narrower n-gram with the same number of parameters. In addition, our models predict equally well forward, backward, or in any combination. We present an on-line learning algorithm which induces both model parameters and structure (which patterns to include), despite a priori uncertainty about which patterns are frequent. Dynamic pattern inclusion complicates probability estimation during on-line learning, but we present and examine several solutions, including a Bayesian approach. Lastly, we describe multiwidth combinations of sparse n-grams that solve two important problems with the simple sparse models, and which are useful for hierarchical identification and composition of repeated substructure in data. These hierarchical sparse n-grams can be learned on-line with no fixed bound on width, using less memory than that taken by the training data.
منابع مشابه
On-Line Cumulative Learning of Hierarchical Sparse n-grams
We present a system for on-line, cumulative learning of hierarchical collections of frequent patterns from unsegmented data streams. Such learning is critical for long-lived intelligent agents in complex worlds. Learned patterns enable prediction of unseen data and serve as building blocks for higher-level knowledge representation. We introduce a novel sparse n-gram model that, unlike pruned n-...
متن کاملSparse Structured Principal Component Analysis and Model Learning for Classification and Quality Detection of Rice Grains
In scientific and commercial fields associated with modern agriculture, the categorization of different rice types and determination of its quality is very important. Various image processing algorithms are applied in recent years to detect different agricultural products. The problem of rice classification and quality detection in this paper is presented based on model learning concepts includ...
متن کاملThe Main Eigenvalues of the Undirected Power Graph of a Group
The undirected power graph of a finite group $G$, $P(G)$, is a graph with the group elements of $G$ as vertices and two vertices are adjacent if and only if one of them is a power of the other. Let $A$ be an adjacency matrix of $P(G)$. An eigenvalue $lambda$ of $A$ is a main eigenvalue if the eigenspace $epsilon(lambda)$ has an eigenvector $X$ such that $X^{t}jjneq 0$, where $jj$ is the all-one...
متن کاملImage Classification via Sparse Representation and Subspace Alignment
Image representation is a crucial problem in image processing where there exist many low-level representations of image, i.e., SIFT, HOG and so on. But there is a missing link across low-level and high-level semantic representations. In fact, traditional machine learning approaches, e.g., non-negative matrix factorization, sparse representation and principle component analysis are employed to d...
متن کاملLearning Classifiers for Assigning Protein Sequences to Gene Ontology Functional Families
Assigning putative functions to novel proteins and the discovery of sequence correlates of protein function are important challenges in bioinformatics. In this paper, we explore several machine learning approaches to data-driven construction of classifiers for assigning protein sequences to appropriate Gene Ontology (GO) function families using a class conditional probabilistic representation o...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007